RFC: Refactor LatentGP API slightly #216

willtebbutt · 2021-08-25T19:53:21Z

As discussed in #202

I don't want to merge this until we're ready to make another breaking release (I couldn't figure out how to do this without being breaking).

edit: my only reservation about this PR is that having to provide a function which maps to a function which maps to a distribution feels a bit complicated. Not sure whether there's a better approach. See the tests for an example -- it's just rather complicated for humans to interact with.

devmotion · 2021-08-25T19:56:50Z

test/latent_gp.jl

@@ -3,10 +3,9 @@
    x = rand(10)
    y = rand(10)

-    lgp = LatentGP(gp, x -> MvNormal(x, 0.1), 1e-5)
+    lgp = LatentGP(gp, x -> (f -> MvNormal(f, 0.1)), 1e-5)


This constructor of MvNormal was deprecated (JuliaStats/Distributions.jl#1362), it should be replaced with e.g.

Suggested change

lgp = LatentGP(gp, x -> (f -> MvNormal(f, 0.1)), 1e-5)

lgp = LatentGP(gp, x -> (f -> MvNormal(f, 0.01 * I)), 1e-5)

st-- · 2021-08-26T07:05:53Z

A Haskell coder would say that's just how functions work 😂 x -> f -> lik

But I do agree it's a bit cumbersome here. What was your motivation for doing it this way?

What do you think of typing up some concrete use-cases? (not the full implementation, just how the api usage might look)

I'm assuming you are thinking of some likelihood where location = gp(x) and scale = deterministic(x). How would we then handle location = gp1(x) and scale = gp2(x)?

st-- · 2021-08-26T07:08:18Z

What would stop us from simply passing the inputs x directly to the likelihood object when calling it? The LatentFiniteGP already carries x with it, so we could extract it from there in rand and logpdf.

Edited to add: We could also define a trait for "wants to be passed the inputs" so all the likelihoods that don't actually care about x (most of them at this point!) wouldn't have to take a dummy argument.

willtebbutt · 2021-08-26T12:07:16Z

But I do agree it's a bit cumbersome here. What was your motivation for doing it this way?

It just feels really clean / natural to mirror the FiniteGP structure. The idea that you have likelihoods which cover the entire space, and then you turn them into a finite-dimensional thing once you have some inputs.

What would stop us from simply passing the inputs x directly to the likelihood object when calling it? The LatentFiniteGP already carries x with it, so we could extract it from there in rand and logpdf.

I guess this relates to the above. This would also be a perfectly reasonable way to do this.

Whichever we went with, I think I would rather be consistent. Either we have

struct LatentFiniteGP{Tlgp<:LatentGP, Tx<:AbstractVector}
    lgp::Tlgp
    x::Tx
end

and build everything on the fly (which is totally fine, it's all essentially zero-overhead), or we do

struct LatentFiniteGP{Tfx<:LatentGP, Tlikx}
    fx::Tfx
    likx::Tlikx
end

in which both fx and likx already have x baked in to them.

I can't say that I'm overly fussed either way. Do you have a preference @st-- ?

What do you think of typing up some concrete use-cases? (not the full implementation, just how the api usage might look)

I've not thought especially about how users will find using this -- we can always build utilities on top of the API provided that it supports everything that we need. In terms of low-level usage, I was imagining something like (as a heteroscedastic example):

f = GP(Matern52Kernel())
g = my_favourite_positive_valued_function
jitter = 1e-6
latent_gp = LatentGP(f, x -> (f -> MvNormal(f, map(g, x))), jitter)
elbo(approx, latent_gp(x), y)

or something qualitatively like this.

I'm assuming you are thinking of some likelihood where location = gp(x) and scale = deterministic(x). How would we then handle location = gp1(x) and scale = gp2(x)?

You read my mind. For example

f = GP(LinearMixingModelKernel(randn(2, 2), [Matern52Kernel(), Matern52Kernel()]))
lik(x::MOInputIsotopicByOutput) = (f -> MvNormal(f(x[1:N]), map(exp, f(x[N+1:end])))
jitter = 1e-6
latent_gp = LatentGP(f, lik, jitter)

This second example is really quite unpleasant, so we will definitely need utilities built on top of this to make it usable in practice, but hopefully the gist is clear. The point is just that we would now have the flexibility to express something like this.

(Incidentally, one of the things I like about this is that it doesn't privelidge mean-field-per-process approximate posteriors)

We could also define a trait for "wants to be passed the inputs" so all the likelihoods that don't actually care about x (most of them at this point!) wouldn't have to take a dummy argument.

I like this suggestion, as I agree that it would be sad if likelihoods that don't care about inputs had to care about them.

edit: the more that I think about it, the more that I like

struct LatentFiniteGP{Tlgp<:LatentGP, Tx<:AbstractVector}
    lgp::Tlgp
    x::Tx
end

as the structure of the LatentFiniteGP. It mirrors the FiniteGP structure really nicely, and is very clean.

edit2: the only other reason to like the x -> f -> lik over (x, f) -> lik is that if you want to compute the likelihood for multiple values of f, there might be some computations that you can cache across all samples, if x is fixed. Not sure how much of a concern that is in practice, but we could presumably find an example where it matters a lot if we thought hard enough (e.g. throw something expensive like a neural network in the likelihood somewhere, and make it only depend on x)

devmotion · 2021-08-26T12:43:05Z

edit2: the only other reason to like the x -> f -> lik over (x, f) -> lik is that if you want to compute the likelihood for multiple values of f, there might be some computations that you can cache across all samples, if x is fixed. Not sure how much of a concern that is in practice, but we could presumably find an example where it matters a lot if we thought hard enough (e.g. throw something expensive like a neural network in the likelihood somewhere, and make it only depend on x)

Not sure if it is a good idea but maybe one could dispatch on Base.Fix1{typeof(lik)} or more generally Base.Fix1{<:LikelihoodFunction} internally in a path that involves evaluations at the same x for multiple f.

willtebbutt · 2021-08-26T12:59:21Z

Not sure if it is a good idea but maybe one could dispatch on Base.Fix1{typeof(lik)} or more generally Base.Fix1{<:LikelihoodFunction} internally in a path that involves evaluations at the same x for multiple f.

That would definitely do it. Do we know whether Fix1 is an official part of the API? I had always assumed it's not exported because the Julia devs don't really want people depending on it...

st-- · 2021-08-27T09:01:51Z

@willtebbutt your two examples seem inconsistent (and that's why I wanted to see some mock code usage!): in your first example, f is the concrete evaluation of the function at x (e.g. rand(gp(x, jitter))! or GH quadrature points corresponding to the marginal distribution), and that makes sense to me. In the second example, however, it looks like f is the (non-Finite) GP object itself, which you then evaluate at different outputs. But that would mean you're passing a FiniteGP to the MvNormal. How would that work ?

willtebbutt · 2021-08-27T09:59:57Z

You're completely correct. Good point. The second example should instead be something like

f = GP(LinearMixingModelKernel(randn(2, 2), [Matern52Kernel(), Matern52Kernel()]))
lik(x::MOInputIsotopicByOutput) = (f::AbstractVector{<:Real} -> MvNormal(f[1:N], map(exp, f[N+1:end]))
jitter = 1e-6
latent_gp = LatentGP(f, lik, jitter)

The point here is that the type of x tells you that the first N entries correspond to the first output, and the second N entries to the second (hence the need for x). You could also shove a consistency check in to ensure that length(x) == length(f), and to check that there are only two outputs in x. This does feel like something that has to be carefully orchestrated though.

st-- · 2021-08-27T12:47:04Z

lik(x::MOInputIsotopicByOutput) = (f::AbstractVector{<:Real} -> MvNormal(f[1:N], map(exp, f[N+1:end]))

Made me wonder if perhaps we could unify the two isotopic MO inputs into a single parametric type, so that we can pass-through dispatch on the type of isotopy, e.g. MOInputIsotopic{T} where T <: Union{ByOutput, ByFeature}, and

lik(x::MOInputIsotopic{T}) = (f::AbstractVector{<:Real} -> MvNormal(get_mo_output(T, f, 1), map(exp, get_mo_output(T, f, 2)))

? but actually for that to work you do need to know how many different outputs there are ... so maybe a MOInputIsotopic.type struct field that could then contain out_dim, and we have

lik(x::MOInputIsotopic) = (f::AbstractVector{<:Real} -> MvNormal(get_mo_output(x.type, f, 1), map(exp, get_mo_output(x.type, f, 2)))

but it still seems pretty clunky.

And how would it work if you want to build a heterotopic multi-output heteroskedastic model, so if there are observations y1 and y2 which you may observe at different inputs, but wherever you observe y1 you need both f1 and f2, and where you observe y2 you also need f3 and f4? 😂

willtebbutt · 2021-08-27T13:04:35Z

And how would it work if you want to build a heterotopic multi-output heteroskedastic model, so if there are observations y1 and y2 which you may observe at different inputs, but wherever you observe y1 you need both f1 and f2, and where you observe y2 you also need f3 and f4? 😂

Yeah, this is a good point. Perhaps this could be achieved by nesting multi-output models? e.g.

GP(LinearMixingModelKernel([
    LinearMixingModelKernel(kernels_for_mean_processes),
    LinearMixingModelKernel(kernels_for_variance_processes),
]))

Might need to be nested the other way around, but hopefully the idea is clear.

An input for this would be something like

julia> feature_1, feature_2, feature_3 = 1.0, 2.0, 3.0
(1.0, 2.0, 3.0)

julia> MOInputIsotopicByOutputs([(feature_1, 1), (feature_2, 2), (feature_3, 1)], 2)
6-element MOInputIsotopicByOutputs{Tuple{Float64, Int64}, Vector{Tuple{Float64, Int64}}}:
 ((1.0, 1), 1)
 ((2.0, 2), 1)
 ((3.0, 1), 1)
 ((1.0, 1), 2)
 ((2.0, 2), 2)
 ((3.0, 1), 2)

or whatever.

Made me wonder if perhaps we could unify the two isotopic MO inputs into a single parametric type, so that we can pass-through dispatch on the type of isotopy, e.g. MOInputIsotopic{T} where T <: Union{ByOutput, ByFeature}, and

This could make sense. I wonder what the tradeoffs are here vs the supertype option that we've discussed before?

edit: I missed the mixing matrices from the kernels above. I was imagining they'd just be the identity matrix here, although maybe there are situations in which you want your noise variance to be proportional to the mean...

theogf · 2021-08-27T14:11:16Z

What about having an InputDependentLikelihood which takes a function of the form you proposed:

struct InputDependentLikelihood{F}
   f::F
end
to_likelihood(lik, x) = lik
to_likelihood(lik::InputDependentLikelihood, x) = lik.f(x)
LatentFiniteGP(lgp.f(x, lgp.Σy), to_likelihood(lgp.lik, x))

since input dependent likelihoods are more "rare" that would make more sense to need an extra effort to make one?

st-- · 2021-10-11T11:32:18Z

NB- lgp.Σy seems rather misleading; it's got nothing to do with y: could we more explicitly just call the field jitter instead?

willtebbutt added 3 commits August 25, 2021 20:49

Tweak LatentGP API

2453565

Bump minor version to DEV

1bb6e7e

Update docstring

cd59bf7

devmotion reviewed Aug 25, 2021

View reviewed changes

devmotion mentioned this pull request Aug 26, 2021

**BREAKING** change of sampleplot behaviour and defaults #213

Merged

willtebbutt marked this pull request as draft September 15, 2023 19:26

simsurace mentioned this pull request Mar 10, 2024

Allow non-constant likelihoods JuliaGaussianProcesses/GPLikelihoods.jl#105

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Refactor LatentGP API slightly #216

RFC: Refactor LatentGP API slightly #216

willtebbutt commented Aug 25, 2021 •

edited

Loading

devmotion Aug 25, 2021

st-- commented Aug 26, 2021

st-- commented Aug 26, 2021 •

edited

Loading

willtebbutt commented Aug 26, 2021 •

edited

Loading

devmotion commented Aug 26, 2021

willtebbutt commented Aug 26, 2021

st-- commented Aug 27, 2021

willtebbutt commented Aug 27, 2021 •

edited

Loading

st-- commented Aug 27, 2021

willtebbutt commented Aug 27, 2021 •

edited

Loading

theogf commented Aug 27, 2021 •

edited

Loading

st-- commented Oct 11, 2021

	lgp = LatentGP(gp, x -> (f -> MvNormal(f, 0.1)), 1e-5)
	lgp = LatentGP(gp, x -> (f -> MvNormal(f, 0.01 * I)), 1e-5)

RFC: Refactor LatentGP API slightly #216

Are you sure you want to change the base?

RFC: Refactor LatentGP API slightly #216

Conversation

willtebbutt commented Aug 25, 2021 • edited Loading

devmotion Aug 25, 2021

Choose a reason for hiding this comment

st-- commented Aug 26, 2021

st-- commented Aug 26, 2021 • edited Loading

willtebbutt commented Aug 26, 2021 • edited Loading

devmotion commented Aug 26, 2021

willtebbutt commented Aug 26, 2021

st-- commented Aug 27, 2021

willtebbutt commented Aug 27, 2021 • edited Loading

st-- commented Aug 27, 2021

willtebbutt commented Aug 27, 2021 • edited Loading

theogf commented Aug 27, 2021 • edited Loading

st-- commented Oct 11, 2021

willtebbutt commented Aug 25, 2021 •

edited

Loading

st-- commented Aug 26, 2021 •

edited

Loading

willtebbutt commented Aug 26, 2021 •

edited

Loading

willtebbutt commented Aug 27, 2021 •

edited

Loading

willtebbutt commented Aug 27, 2021 •

edited

Loading

theogf commented Aug 27, 2021 •

edited

Loading